Document Image Collection Using Amazon's Mechanical Turk

نویسندگان

  • Audrey N. Le
  • Jerome Ajot
  • Mark A. Przybocki
  • Stephanie Strassel
چکیده

We present findings from a collaborative effort aimed at testing the feasibility of using Amazon’s Mechanical Turk as a data collection platform to build a corpus of document images. Experimental design and implementation workflow are described. Preliminary findings and directions for future work are also discussed.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Paragraph Acquisition and Selection for List Question Using Amazon's Mechanical Turk

Creating more fine-grained annotated data than previously relevent document sets is important for evaluating individual components in automatic question answering systems. In this paper, we describe using the Amazon’s Mechanical Turk (AMT) to judge whether paragraphs in relevant documents answer corresponding list questions in TREC QA track 2004. Based on AMT results, we build a collection of 1...

متن کامل

DP2: Distributed 3D image segmentation using micro-labor workforce

SUMMARY This application note describes a new scalable semi-automatic approach, the Dual Point Decision Process, for segmentation of 3D structures contained in 3D microscopy. The segmentation problem is distributed to many individual workers such that each receives only simple questions regarding whether two points in an image are placed on the same object. A large pool of micro-labor workers a...

متن کامل

Using Amazon's Mechanical Turk for Annotating Medical Named Entities.

Amazon's Mechanical Turk (AMT) service is becoming increasingly popular in Natural Language Processing (NLP) research. In this poster, we report our findings in using AMT to annotate biomedical text extracted from clinical trial descriptions with three entity types: medical condition, medication, and laboratory test. We also describe our observations on AMT workers' annotations.

متن کامل

Creating Speech and Language Data With Amazon's Mechanical Turk

In this paper we give an introduction to using Amazon’s Mechanical Turk crowdsourcing platform for the purpose of collecting data for human language technologies. We survey the papers published in the NAACL2010 Workshop. 24 researchers participated in the workshop’s shared task to create data for speech and language applications with $100.

متن کامل

Establishing a Database for Studying Human Face Photograph Memory

Contemporary visual environments bombard us with hundreds of face images every day, and this places a nontrivial demand on long-term memory. However, little is known about what makes certain faces remain in our memories, while others are quickly forgotten. To establish a basis for face memorability exploration, we assembled a database of 8,690 face photographs from online sources, spanning dive...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010